Unsupervised False Friend Disambiguation Using Contextual Word Clusters and Parallel Word Alignments

نویسندگان

  • Maryam Aminian
  • Mahmoud Ghoneim
  • Mona T. Diab
چکیده

Lexical false friends (FF) are the phenomena where words that look the same, do not have the same meaning or lexical usage. FF impose several challenges to statistical machine translation. We present a methodology which exploits word context modeling as well as information provided by word alignments for identifying false friends and choosing the right sense for them in the context. We show that our approach enhances SMT lexical choice for false friends across language variants. We demonstrate that our approach reduces word error rate (WER) and position independent error rate (PER) for Egyptian-English SMT by 0.6% and 0.1% compared to the baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translation-oriented Word Sense Induction Based on Parallel Corpora

Word Sense Disambiguation (WSD) is an intermediate task that serves as a means to an end defined by the application in which it is to be used. However, different applications have varying disambiguation needs which should have an impact on the choice of the method and of the sense inventory used. The tendency towards application-oriented WSD becomes more and more evident, mostly because of the ...

متن کامل

Contextual Modeling for Meeting Translation Using Unsupervised Word Sense Disambiguation

In this paper we investigate the challenges of applying statistical machine translation to meeting conversations, with a particular view towards analyzing the importance of modeling contextual factors such as the larger discourse context and topic/domain information on translation performance. We describe the collection of a small corpus of parallel meeting data, the development of a statistica...

متن کامل

Disambiguation of partial cognates

Cognates – words that have similar spelling and meaning in two or more languages – can accelerate vocabulary acquisition and facilitate the reading comprehension task. A student has to pay attention to the pairs of words that look and sound similar but have different meanings – false-friend pairs, and especially to pairs of words that share meanings in some but not all contexts – partial cognat...

متن کامل

Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models

We describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. The first model, which we call the Sense model, builds on the work of Diab and Resnik (2002) that uses both parallel text and a sense inventory for the target language, and recasts their approach in a probabilistic framework. The second model, which we call the Concept model, is a hierarchica...

متن کامل

UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness

In this paper we describe an unsupervised WordNet-based Word Sense Disambiguation system, which participated (as UMND1) in the SemEval-2007 Coarsegrained English Lexical Sample task. The system disambiguates a target word by using WordNet-based measures of semantic relatedness to find the sense of the word that is semantically most strongly related to the senses of the words in the context of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015